The complete chloroplast genome of Prunus phaeosticta (Hance) Maxim. (Rosaceae) and its phylogenetic implications

Abstract The complete chloroplast (cp) genome of Prunus phaeosticta (Hance) Maxim. has been characterized by reference-based assembly using Illumina paired-end data. The circular complete cp genome is 158,752 bp in length, comprising a large single-copy (LSC) region of 87,085 bp, a small single-copy (SSC) region of 18,923 bp, and a pair of inverted repeats (IRs) of 26,372 bp.A total of 129 functional genes were identified, including 84 protein-coding genes, 37 tRNA genes, and 8 ribosomal RNA genes. The phylogenetic analysis showed that P. phaeosticta displayed a kinship to Prunus zippeliana.


Introduction
Prunus L. (Prunus sensu lato) is well known for its taxonomic complexity, the phylogenetic relationships among subgenera, sections, and species are still unclear (Wen et al. 2008;Shi et al. 2013;Chin et al. 2014). The most recent classification of the genus recognized only three subgenera: Prunus, Cerasus (Mill.) A. Gray, and Padus (Mill.) Peterm., with a broader concept of subg. Padus that included Laurocerasus Duhamel and the former genera Maddenia Hook. f. & Thoms and Pygeum Gaertn. (Shi et al. 2013). Recent phylogenetic studies revealed Prunus subg. Padus to be polyphyletic, taxa of subg. Padus and subg. Laurocerasus are highly intermixed in phylogenetic trees reconstructed using a limited set of short DNA sequence (Liu et al. 2013;Chin et al. 2014;Zhao et al. 2016). The chloroplast genome is a robust and appropriate tool that could provide much better ability than the universal DNA markers on revealing phylogeny and evolutionary history of plants (Gitzendanner et al. 2018;Do et al. 2020;Su et al. 2021) However, genomic resources of Laurocerasus species are extremely scarce (Zou et al. 2019). In order to better understand the molecular phylogenetic relationship among the subg. Padus, we assembled and characterized the first complete chloroplast genome of Prunus phaeosticta (Hance) Maxim. 1883 using the next-generation sequencing technology. Since P. phaeosticta is one of the most widespread species, which ranges from India and Thailand all the way to East China and contains seven forms, we chose to sequence it to contribute genetic resources and enhance understanding of this polymorphic species. Furthermore, a phylogenomic analysis of 14 Prunus species was also presented. The results will lay a solid foundation for the future phylogenetic researches of P. sensu lato.

Plant materials and DNA extraction
Fresh young leaves of P. phaeosticta ( Figure 1) were collected from Mt. Jiulong, Suichang County, Zhejiang province, China (28.3850 N,118.8927 E). A voucher specimen and its DNA sample were deposited at the Herbarium of Zhejiang University (HZU, https://www.zju.edu.cn/; collector: Pan Li, panli_zju@126.com) under the voucher number LP208067. Genomic DNA was extracted using a modified CTAB protocol (Doyle and Doyle 1987).

Genome organization and compositions
The chloroplast genome of P. phaeosticta presented a typical circular DNA molecule with a total length of 158,752 bp ( Figure 2). It has a characteristic quadripartite structure with a large single-copy (LSC) region of 87,085 bp, a small singlecopy (SSC) region of 18,923 bp, and a pair of inverted repeats (IRs) of 26,372 bp. The total GC content in the chloroplast genome of P. phaeosticta was 38.6%. The GC content of the IR region (43.4%) was higher than those of the LSC (37.0%) and SSC (33.4%) regions.

Phylogenetic analysis
A robust phylogeny of subg. Padus was obtained based on the CDS data ( Figure 3). In our phylogenetic tree, the 14 species were clustered into two main clades: 1) Laurocerasus clade; 2) Maddenia & Padus clade. Maddenia species formed a monophyletic clade, but Padus species were revealed as a paraphyletic group (Figure 3). According to the phylogenetic tree, P. phaeosticta and P. zippeliana clustered within the Laurocerasus clade and the sister relationships between P. phaeosticta and P. zippeliana were observed.

Discussion
As described previously, Prunus L. (P. sensu lato) is well known for its taxonomic complexity. Although some comprehensive studies on the phylogenetic relationships of P. sensu lato has significantly promoted the understanding of the interspecies relationships in this genus (Wen et al. 2008;Liu et al. 2013;Shi et al. 2013;Chin et al. 2014;Zhao et al. 2016), but limited data of genome level has become an impediment to resolve the phylogeny and the interspecies relationships of P. sensu lato. In this study, we sequenced the chloroplast genome of P. phaeosticta and conducted a systematic analysis of the chloroplast genomes with 13 other Prunus chloroplast genomes. Our systematic analysis result was consistent with the recent phylogenetic study on P. sensu lato, the monophyly of Maddenia was supported and the Padus species were paraphyletic (Shi et al. 2013;Chin et al. 2014;Su et al. 2021). According to our result, P. phaeosticta clustered within the Laurocerasus group and exhibited the closest relationship with P. zippeliana. We expect that the cp genome of P. phaeosticta will be a valuable genomic resource for future taxonomy, phylogeny studies on P. sensu lato and new Prunus cultivar breeding.

Ethical approval
The material involved in the article does not involve ethical conflicts. This study was permitted by Jinhua Academy of Agricultural Sciences and Taizhou University, China. All collection and sequencing work was strictly executed under local legislation and related laboratory regulations to protect wild resources.

Author contributions
The article was designed and conceived by Jai-Qi Wu and Jian-Sheng Shen; Yi Wang and Zhong-Shuai Sun assembled and annotated the cp genome; Jai-Qi Wu contributed significantly to phylogenetic analysis and manuscript preparation; Jian-Sheng Shen was involved in interpretation of the data and revised the manuscript critically for intellectual content. All authors approved the final version to be published and agreed to be accountable for all aspects of the work.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Data availability statement
The genome sequence data that support the findings of this study are openly available in GenBank of NCBI at (https://www.ncbi.nlm.nih.gov/) under the accession no. ON557295. The associated BioProject, SRA, and Bio-Sample numbers are PRJNA841240, SRR19348248 and SAMN28578094, respectively.